Decision fusion by boosting method for multi-modal voice activity detection

نویسندگان

  • Shin'ichi Takeuchi
  • Takashi Hashiba
  • Satoshi Tamura
  • Satoru Hayamizu
چکیده

In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. In multi-modal (speech) signal processing, there are two methods for fusing the audio and the visual information: concatenating the audio and visual features, and employing audioonly and visual-only classifiers, then fusing the unimodal decisions. We investigate the effectiveness of decision fusion given by the results from AdaBoost. AdaBoost is one of the machine learning method. By using AdaBoost, the effective classifier is constructed by combining weak classifiers. It classifies input data into two classes based on the weighted results from weak classifiers. In proposed method, this fusion scheme is applied to decision fusion of multi-modal VAD. Experimental results show proposed method to generally be more effective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voice activity detection based on fusion of audio and visual information

In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. Audioonly VAD systems typically are not robust to (acoustic) noise. Incorporating visual information, for example information extracted from mouth images, can improve the robustness since the visual information is not affected by the acoustic noise. In multi-modal (speech) signa...

متن کامل

An Augmented Multi-tiered Classifier for Instantaneous Multi-modal Voice Activity Detection

As mobile devices, intelligent displays, and home entertainment systems permeate digital markets, the desire for users to interact through spoken and visual modalities similarly grows. Previous interactive systems limit voice activity detection (VAD) to the acoustic domain alone, but the incorporation of visual features has shown great improvement in performance accuracy. When employing both ac...

متن کامل

Multimedia Evidence Fusion for Video Concept Detection via OWA Operator

We present a novel multi-modal evidence fusion method for highlevel feature (HLF) detection in videos. The uni-modal features, such as color histogram, transcript texts, etc, tend to capture different aspects of HLFs and hence share complementariness and redundancy in modeling the contents of such HLFs. We argue that such inter-relation are key to effective multi-modal fusion. Here, we formulat...

متن کامل

Damage detection of multi-girder bridge superstructure based on the modal strain approaches

The research described in this paper focuses on the application of modal strain techniques on a multi-girder bridge superstructure with the objectives of identifying the presence of damage and detecting false damage diagnosis for such structures. The case study is a one-third scale model of a slab-on-girder composite bridge superstructure, comprised of a steel-free concrete deck with FRP rebars...

متن کامل

On the Improvements of - Uni-modal and Bi-modal Fusions of Speaker and Face Recognition for Mobile Biometrics

The MOBIO database provides a challenging test-bed for speaker and face recognition systems because it includes voice and face samples as they would appear in forensic scenarios. In this paper, we investigate uni-modal and bimodal multi-algorithm fusion using logistic regression. The source speaker and face recognition systems were taken from the 2013 speaker and face recognition evaluations th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010